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1 .  Introduction 

One  can  argue  that  good  principles  of  design  for  electronic  computing  systems  may  also  be  good 
principles  for  neuronal  computing  systems.  Fodor  (1975)  argues  that  the  mind  is  a  stored  program 
computing  system  where  the  programs  are  learned  and  the  hardware  is  innately  specified.  Simon  (1969) 
argues  that  mental  functions  must  be  nearly  decomposable  and  hierarchically  arranged.  Fodor  (1983) 
argues  that  the  mind  is  modular,  with  one  module  for  language,  another  for  imagery,  etc.  All  three 
hypotheses-programability,  hierarchy,  and  modularily  --  are  supported  by  empirical  tests,  or  at  least,  they 
could  be.  The  mere  analogy  to  computing  systems  is  not  by  itself  a  convincing  argument  for  a 
psychological  principle,  although  it  is  suggestive. 

Recently,  it  has  become  common  to  assume  that  a  further  principle  of  computing  systems  is 
obeyed  by  the  mind.  The  layering  hypothesis  is  that  the  mind  consists  of  layers  of  programmable 
machines.  Each  machine  consists  of  memories  and  primitive  processes,  the  operation  of  which  is 
controlled  by  the  program  running  on  that  machine.  The  machine  of  the  top  layer  runs  the  "end  user" 
program  which,  roughly  speaking,  performs  the  task  at  hand.  Every  other  machine  runs  a  program  that' 
implements  the  memories  and  primitive  processes  of  the  machine  above  it  in  the  stack  of  layers.  The 
bottom  machine  is  actual  neuronal  "hardware."  Perhaps  connection  systems  are  low  in  the  stack,  close 
to  the  neuronal  layer  but  not  identical  to  it.  Further  up  the  stack  are.  perhaps,  spreading  activation  or 
marker-passing  systems  that  employ  "localist"  or  symbolic  endings.  On  the  top  might  be  a  general 
problem  solving  architecture,  such  as  SOAR  (Laird,  Rosenbloom,  &  Newell,  1986)  or  a  special  purpose 
architecture  for,  e  g.,  processing  language. 

Although  this  picture  of  the  mind  is  not  new,  a  closer  examination  of  it  reveals  that  it  is  incomplete. 
A  natural  part  of  the  layering  principle  in  computer  systems  design  is  that  the  layers  can  be  changed  in 
particular,  any  layer,  except  the  hardware  layer,  can  be  replaced  without  changing  the  others  This  essay 
examines  the  implications  of  the  hypothesis  that  layers  can  be  changed.  In  the  first  section,  the  computer 
science  rationale  for  changeable  layering  is  presented,  In  section  two,  these  implications  are  extended 
via  the  usual  analogy  of  minds  and  computers. 

What  is  missing  from  this  brief  speculative  essay  is  a  proper  defense  of  the  changable-layers 
hypothesis.  In  other  work  of  this  kind  (e.g. ,  Fodor,  1983;  Fodor,  1975;  Simon.  1969),  hypotheses  are 
supported  by  showing  how  they  simplify  the  accounts  of  numerous  psychological  phenomena.  No  such 
support  is  ventured  here.  So  what  we  have  here  is  pure  speculation,  although  its  relevance  to 
contemporary  debates  in  cognitive  science  makes  it  germane  and  perhaps  even  interesting. 


2.  Layers  of  Virtual  Artificial  Machines 

It  is  standard  to  implement  complex  electronic  computing  systems  in  layers.  The  bottom  layer  is 
the  machine's  hardware.  It  usually  consists  of  several  types  of  memory  with  different  capacities  and 
decay  rates.  There  are  multiple  data  pathways  and  multiple  special-purpose  processors,  often  with 
different  timing  characteristics.  There  are  usually  several  clocks  and  myriads  of  gates  (controlled 
switches)  available  to  regulate  the  flowing  data.  Orchestrating  this  parallel  maze  of  processing  is  a 
program,  writfen  in  a  special  language  called  the  ''microcode."  It  is  designed  to  execute  another  program 
that  is  written  in  a  language  called  the  "macrocode."  The  macrocode  program  operates  on  the  assumption 
that  there  is  just  one  processor,  and  one  (or  a  few)  memories.  The  microcode  program  makes  this 
assumption  valid.  It  implements  a  virtual  processor  and  memories,  or  a  virtual  machine  as  it  is  standardly 
called.  The  virtual  machine  is  much  simpler  than  the  real  machine.  Moreover,  it  is  designed  to  be  the 
kind  of  simplicity  that  enables  simple  macrocode  programs  to  perform  complex  tasks. 

On  many  computers,  there  are  several  layers  of  virtual  machines.  The  lowest  layers  are  written  by  ■ 
the  computer's  manufacturer  and  come  as  part  of  the  machine.  The  middle  layers  are  usually  an- 
operating  system,  such  as  Unix,  and  on  top  of  that,  an  interpreter  for  a  programming  language  such  as 
LISP  or  LOTUS  1-2-3.  The  user's  program  might  itself  implement  another  VM.  For  instance,  a  user's  C 
program  might  implement  a  LISP  interpreter,  and  the  LISP  program  might  implement  an  interpreter  for  a 
production  system. 

The  layering  of  virtual  machines  is  a  good  design  principle  because  it  permits  very  complicated 
tasks  to  be  achieved  with  layers  of  simple  programs.  The  programs  that  implement  VMs,  which  are  called 
interpreters,  all  have  a  similar  structure.  They  are  a  single  giant  loop  that  runs  endlessly.  Inside  the  loop, 
the  most  important  task  is  to  fetch  the  next  instruction  in  the  macro  program  (the  one  being  interpreted), 
then  call  the  subroutines  that  implement  that  type  of  instruction.  In  addition,  the  loop  may  have  some 
general  housekeeping  tasks  that  must  be  done  constantly,  such  as  managing  memory  or  checking  the 
state  of  peripheral  devices. 

Interpreters  are  conceptually  simple  because  they  are  modular.  Each  type  of  instruction  in  the 
macro  language  corresponds  to  one  module  in  the  interpreter.  The  modules  are  written  so  that  macro 
instructions  can  be  implemented  in  any  order  This  independence  of  order  is  the  key  to  allowing  complex 
tasks  to  be  simply  programmed  in  the  macro  language.  The  program  writer  does  not  need  to  worry  about 
fine-grained  interactions  among  the  macro  instructions,  because  there  are  guaranteed  to  be  none.  This 
leaves  the  programmer  free  to  consider  only  large-grained  interactions--those  that  are  intended  to  occur 
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A  second  important  feature  of  layering  is  that  it  allows  changing  virtual  machines.  Typically,  there 
are  only  two  useful  applications  of  this  ability: 

•  Changing  the  bottom  layers  and  leaving  the  top.  This  is  useful  when  the  same  user 
programs  need  to  be  run  on  different  hardware.  For  instance,  many  different  hardware 
manufactures  provide  appropriate  lower  layers  for  interpreting  the  UNIX  operating  system. 

This  allows  UNIX  programs  written  on  other  machines  to  be  executed  on  theirs. 

•  Changing  the  top  layers  and  leaving  the  bottom.  This  is  useful  when  the  same  hardware,  is 
needed  for  different  applications.  For  instance,  one  application  might  be  best  handled  with  a 
program  written  in  LISP,  while  another  application  program  is  better  suited  for 
implementation  in  Prolog.  By  changing  the  top  few  layers,  one  can  change  a  LISP  virtual 
machine  into  a  Prolog  virtual  machine. 

3.  Layers  of  Virtual  Mental  Machines 

If  the  mind  is  like  a  computer,  then  good  design  principles  for  computers  should  be  good  design 
principles  for  minds.  The  preceding  section  described  one  good  design  principle,  the  layering  of  VMs. 
Assuming  that  Nature  has  designed  our  the  minds  well,  the  mind  too  should  have  layers  of  VM,  and 
moreover,  some  of  those  layers  should  be  changeable.  The  analogy  between  layered  computers  and 
layered  minds  is  quite  direct.  Electronic  hardware  corresponds  to  neuronal  tissue.  The  next  layer  up ' 
might  be  appropriately  modelled  by  a  connection  system  of  the  distributed  kind.  The  next  level  up  might 
be  a  localist  connection  system.  Perhaps  several  layers  higher,  the  mind  has  several  VMs-perhaps  a  VM 
something  like  Berwick's  (1985)  Lparisfal  for  language,  a  VM  like  Kosslyn's  (19  )  screen  for  imagery,  and 
a  VM  like  SOAR  (Laird,  Rosenbloom,  &  Newell,  1986)  for  general  problem  solving. 

This  view  of  the  mind  may  not  be  particulary  startling.  I  think  that  most  cognitive  scientists  already 
believe  the  mind  is  organized  as  multiple  layers.  Indeed,  there  have  been  several  investigations  of  how  a 
program  written  for  a  connection  system  can  implement  a  serial,  von  Newman-style  VM  (see.  e  g. 
Rumelhart,  Smolensky,  McClelland  and  Hinton,  1986)  . 

However,  a  tacit  assumption  of  the  layered  view  of  mind  is  that  the  mind  consists  of  a  single, 
unchanging  stack  of  virtual  machines,  from  neurons  on  up  to  e  g.,  a  production  system.  But  this  static 
view  of  the  layers  of  mind  implies  that  one  of  the  biggest  advantages  of  electronic  VM  layers  is  not  utilized 
by  the  mind--the  ability  to  change  layers.  It  seems  much  more  plausible  that  the  mind  can  swap  VMs. 
and  does  so  whenever  it  is  useful. 

Just  as  in  computing  systems,  there  may  be  two  kinds  of  utility:  changing  the  top  layers  and 
changing  the  bottom  layers.  Changing  the  top  layers  is  useful  when  a  person's  task  changes.  The  mind 


might  swap  in  Kosslyn’s  machine  while  doing  an  imagery  task,  and  Newell’s  machine  while  doing  a 
problem  solving  task.  Changing  the  bottom  layers  is  useful  when  the  memory  resources  available  for  a 
task  change.  This  latter  ability  might  seem  rather  useless,  since  it  is  a  safe  assumption  that  the  internal 
memory  resources  {e.g.,  short-term  memory,  long-term  memory)  change  very  slowly,  if  it  all.  However,  if 
lower  VM  are  in  charge  of  utilizing  external  memory  resources,  such  as  a  piece  of  scratch  paper  or  a 
blackboard,  then  memory  resources  can  change  rapidly,  so  the  ability  to  switch  the  lower  VM  layers  might 
in  fact  be  useful. 

Indeed,  it  seems  possible  to  use  this  notion  of  changing  the  external  memory  resources  as  test  of 
the  changeable  layers  hypothesis.  Suppose  that  the  mind  uses  one  lower  VM  for  mental  arithmetic  and 
another  for  regular  written  arithmetic.  There  is  a  second,  higher  VM  for  executing  the  arithmetic 
algorithm.  The  lower  VM  implements  the  algorithm's  primitives  (e.g.,  "what  digits  are  in  the  hundreds 
column?").  These  assumptions  predict  that  if  a  subject  learns  to  do  both  addition  and  subtraction  on 
paper,  then  learns  how  to  do  mental  addition,  the  subject  can  immediately  do  mental  subtraction.  This 
massive  transfer  is  caused  by  swapping  out  the  VM  for  writing  and  swapping  in  the  VM  for  mental! 
"writing",  which  was  learned  with  the  addition  program.  If  VM  cannot  be  swapped,  then  the  three  learning 
episodes  generate  three  program/primitive  combinations.  To  master  mental  subtraction  would  require 
learning  a  new  program/primitive  combination.  Such  static,  unchanging  VM  layers  predict  less  than 
immediate  transfer. 

Unfortunately,  the  situation  is  not  nearly  so  simple.  Current  theories  of  transfer  (Kieras  &  Bovair, 
1985,  Singley  &  Anderson,  ress)  hold  that  the  amount  of  transfer  is  proportional  to  the  number  of 
productions  shared  between  the  old  procedure  and  the  new  procedure,  and  transfer  is  measured  by 
saving  in  learning  time.  In  these  studies,  there  is  only  one  old  procedure,  and  it  does  not  completely 
overlap  the  new  procedure.  Consequently,  some  productions  in  the  new  procedure  must  be  learned 
afresh  rather  than  being  transferred  from  the  old  procedure.  In  the  hypothetical  experiment  under 
discussion,  there  are  two  old  procedures,  a  written  subtraction  procedure  and  a  mental  addition 
procedure.  If  these  procedures  are  represented  as  homogeneous  sets  of  productions,  then  the  new 
procedure  would  be  constructed  by  taking  half  its  productions  from  each  of  the  two  old  procedures.  This 
would  predict  a  complete  savings  in  learning,  which  would  be  consistent  with  the  expected  result  that 
mental  subtraction  could  be  acquired  immediately.  Thus,  the  prediction  of  massive  transfer  can  be 
obtained  without  swapping  layers. 
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However,  this  unchangable-layers  account  for  the  putatively  massive  transfer  breaks  down  when 
analyzed  more  closely.  Transfer  is  a  form  of  analogy,  and  in  all  current  theories  of  analogy,  there  is  some 
search  involved  in  finding  appropriate  old  knowledge  that  can  be  used  as  the  source  of  the  analogy.  To 
perform  the  transfer  in  this  hypothetical  experiment,  the  search  would  have  to  decide  which  productions  to 
get  from  the  written  subtraction  procedure  and  which  to  get  from  the  mental  addition  procedure.  If  this 
search  takes  any  resources  at  all,  transfer  would  be  less  than  immediate.  However,  the  search  would 
become  trivial  (and  hence  consistent  with  the  expected  experim'ental  findings)  if  the  production  sets  for 
the  old  procedures  were  divided  by  a  boundary  line,  so  that  productions  below  the  line  dealt  with  the 
medium  for  storage  (i.e.,  the  paper  or  the  mind)  and  the  productions  above  the  line  dealt  with  the 
arithmetic  algorithm.  This  is  just  exactly  the  layering  hypothesis,  instantiated  in  a  production  system 
formalism. 

In  summary,  the  empirical  discrimination  between  the  hypotheses  of  layered  versus  unlayered 
production  sets  rests  on  finding  out  just  exactly  how  immediate  the  transfer  is  in  the  experiment  under 
discussion.  If  it  is  virtually  instantaneous,  then  the  layering  hypothesis  is  supported.  So  the  situation  is . 
actually  an  empirically  delicate  one. 

4.  Summary 

This  essay  has  assumed,  along  with  much  of  the  cognitive  science  community,  that  the  mind 
consists  of  layers  of  virtual  machines.  This  assumption  is  part  of  the  continued  tradition  of  treating  the 
design  principles  for  good  computing  systems  as  serious  hypotheses  about  the  mind's  design. 

The  point  of  the  essay  was  to  suggest  (and  only  suggest)  that  all  of  the  principle  of  layered 
computing  systems  be  taken  seriously,  and  in  particular,  that  the  ability  to  change  a  layer  in  the  stack  of 
virtual  machines  be  considered  as  a  psychological  hypothesis.  For  instance,  the  top  layer  might  change 
when  the  task  changes  radically,  say,  from  an  mental  rotation  task  to  a  cryptarithmetic  task.  The  bottom 
layers  might  change  when  the  memory  resources  available  to  the  person  change. 

As  shown  above,  the  changable-layers  hypothesis  does  make  predictions,  especially  predictions 
about  transfer,  so  it  is  a  testable  hypothesis.  However,  it  will  be  a  difficult  hypothesis  to  test,  as  the 
discussion  above  indicated. 

The  layering  hypothesis  opens  up  a  whole  host  of  issues 


•  Where  do  VM  come  from?  Are  they  learned? 

•  What  happens  while  a  VM  is  being  swapped?  Where  does  the  consciousness  (or  attention) 
stand?  In  a  meta-level? 

•  Does  the  layering  hypothesis  allow  us  to  reconcile  old  controversies,  such  as  the  conflict 
between  problem  solving  architectures,  which  seem  to  require  arbitrary  capacity  working 
memories,  and  short  term  memory  architectures,  with  their  severe  capacity  limitations? 

•  Does  compiling  from  one  layer  down  to  the  next  explain  practice  effects,  such  decreasing 
competition  for  resources  and  decreasing  cognitive  penetrability? 

•  Does  learning  a  new  VM  explain  how  people  can  recover  from  some  neurological  tramas? 

My  hope  is  that  the  mind  is  much  simpler  than  we  ever  thought.  We  only  thought  it  was  complex 
because  we  tried  to  model  it  without  using  enough  layers,  which  is  about  as  hard  as  trying  to  write  a 
Tower  of  Hanoi  problem  solver  in  microcode. 
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